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Near Earth object (NEO) observation 
Design reference asteroids 

Impact modelling 

Decision support 


Mitigation action 


In this U.S., the NASA Planetary Defense Coordination Office 
(PDCO) was established in 2016 to study the mitigation of 
potential Near-Earth Object (NEO) impacts to our home 
planet. 


Image source: http://www.universetoday.com/128347/nasa-discovers-72-new-never-seen-neos/ 


Motivation for an Maas 
Information Framework CSTE SePLay 


Information about detecting, characterizing and mitigating NEO threats is 
dispersed (e.g. publications, briefings.) 


An overall architecture to facilitate the collaborations and integrate the different 
capabilities to achieve the most sensible, executable options for mitigation 


A cyberinfrastructure to capture mitigation trades, analyses, model output, risk 
projections, and mitigation mission design concepts 


Discovery and easy access to knowledge and expert opinion within the project 
team, as well as factoring in related information from other research and 
analysis activities 
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Domain-specific vs. general-purpose 


Indexed content 
— Google searches from nearly the entire Internet 


— The framework is PD-specific 
Planetary Defense 
related info 
Knowledge base 
— Google’s Knowledge Graph is based on generic sources such as Wikipedia 


— The framework will create a PD ontology aided by domain experts, combined with 
machine learning and Natural Language Processing (NLP) results 


Decision makers can have easy access to required information and quality 
knowledge 
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e Name entity recognition (NER) 
° Relation extraction (RE) 
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Framework Gateway 
¢ Web Portal: http://pd.cloud.gmu.edu/ 


e User management, document archiving, vocabulary editing 
web crawling, search engine 
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User management aniratyie 


User roles: Administer, authenticated user, anonymous 
user 


Manage access control with permissions and user roles 
Assign permissions and roles to users 


Ban an IP address - The Ban module allows administrators 
to ban visits to their site from individual or a range of IP 
addresses. 
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Ongoing research 


e Domain specific crawling 
e Knowledge extraction from 
plain text 
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Simplest approach: filter web pages using a keyword list (e.g. 
NEO, asteroid, Bennu, ...) composed by domain experts. 


—< Distribution of 


Problems: x relevant pages 
¢ Expensive —— iia 
e Difficult to exhaust . 
e Difficult to assign weights to different | 
keywords 


e Treat all web pages equally (a page on NASA website and 
a random one) 


Image source: http://www.seminarsonly.com/computer%20science/focused-web-crawling-for-e-learning-content.php 


Existing tools in Open Source crawler (e.g. Nutch): 
e Link-based 

— Scoring links (OPIC, PageRank scoring) 

— Breadth first or Depth first crawl 


e Content-based 
— URL, mimetype filter 
— Cosine Similarity scoring filter (what we are using) 
— Naive Bayes parse filter 


Image source: https://en.wikipedia.org/wiki/PageRank 
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e Combine content and 
link-based scoring to 
boost the authoritative 
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e Dynamically update/grow 


Vee the vocab using info (e.g. 
title) from the web pages 
Store/Index pages, Update 
links 
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e Weight keywords based 
on frequency clustering 
(i.e. more frequently seen 
terms have more weights) 


Keywords extractor 
(e.g. title) 


Engage the community to help with the evaluation 
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Goal: Extract structured information from unstructured web 
pages and user uploaded documents 


Relation extraction in NLP: finding semantic triples (SPO) 
from sentences 


Predicate 
The UV Index is a measure of the intensity of UV rays from the Sun. 


Subject Object 


Pattern-based, supervised, semi-supervised, and open 
information extraction 


Hand-written patterns 


e “Y such as X” 

e “such Y as X” 

e “X or other Y” 
e “Y including X” 


Relation extraction 


+ Tend to be high-precision 

+ Tailored to specific domains 

- Human patterns are often low- 
recall 

- Hard to be exhaustive 
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e Recently published by Univ. of Washington 

e Extract relations from the sentences with no training data, no list of 
relations (unsupervised) 

¢ Self-learning process, syntactic and lexical/semantic patterns 


The U.S. president Barack Obama gave his speech on Tuesday to thousands of people. 
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(Barack Obama, is the president of, the U.S.) 

(Barack Obama, gave, his speech) 

(Barack Obama, gave his speech, on Tuesday) 

(Barack Obama, gave his speech, to thousands of people) 


Gabor Angeli, Melvin Johnson Premkumar, and Christopher D. Manning. Leveraging Linguistic Structure For Open Domain Information Extraction. In 
Proceedings of the Association of Computational Linguistics (ACL), 2015. 
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the GHRSST is a truly international project with over $18 Million USToday 

Jason-1 has a repeat period of approximately 10 days with 254 passes per cycle 

ee ee ee eee ee se sees esses, 
jJason-3 is capable of measuring significant wave height, sigma naught (sigmaO), dry and wet tropos ! 
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The Aquarius has 3 radiometer beams in push-broom alignment with footprint resolutions of 76 km 

instrument 

Jason-3 has a repeat period of approximately 10 days with 254 passes per cycle 

Jason-1 is capable of measuring significant wave height, sigmaO, dry and wet troposphere and ionos 
Level-2 data refer to monthly estimates of spherical harmonic coefficients of the Earth gravity field 

no downlink signal was detected At the beginning of the next contact at 0249 UTC 
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ysensors included a CTD at the near-surface and another at 6 m depthFor SPURS-1 I 
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e Some are reasonable, some are noise 
¢ Working on reducing noise/identifying reasonable results 
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e The proposed architecture framework benefits the PD community by 


— Providing discovery and easy access to the knowledge and expert opinion 
within the project team 


— Maximizing the linkage between different organizations, scientists, engineers, 
decision makers, and citizens 
e Next steps 
— Develop a knowledge base & search ranking for NEO mitigation resources 


— Investigate a knowledge reasoning model for potential mitigation by 
assimilating existing scenarios 


— Build a 4D visualization tool based on new datasets and existing tools 
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